Data Exploration

Our exploratory data analysis examines patterns that inform both research questions about usage context and feature adoption. We organize our exploration into four main categories:

Blocked Website Patterns

Figure 5: Website Usage Analysis - Distribution of blocked websites by category (left) and frequency of individual websites (right)

The analysis of blocked websites reveals distinct patterns in how users interact with the Jargon extension. Professional tools—particularly Salesforce and AI platforms—are the most frequently blocked, suggesting that users tend to avoid using Jargon during work-related activities. The presence of development environment blocks indicates that some users are technical professionals, though this group represents only a modest portion of the overall user base. Educational content also features prominently among blocked websites, with users often disabling the extension on documentation sites and learning platforms, possibly to maintain focus during concentrated study sessions.

However, it is important to note that there are only 27 blocked sites across 92 users. This limited usage suggests that the blocking feature is not widely utilized, and the current data may not be conclusive. Caution should be exercised when generalizing these findings, as they may not fully represent the broader user population.

Language Mode Usage

Figure 6: Scatter plot showing the relationship between user adoption and question generation across different language modes

The scatter plot highlights key patterns in language mode usage:

  • Spanish is the most active mode, with the highest number of questions (~800) and users (~30).
  • GlizzyTalk and Tamil show moderate engagement (~300 questions each).
  • Korean and GRE Vocabulary form a middle tier (~200 questions).
  • Most other languages have low adoption, with fewer users and questions.
  • Some modes (e.g., Tamil) have high question counts despite fewer users, indicating intensive use by dedicated learners.

Overall, while usage intensity and adoption vary widely across languages, traditional language learning modes drive most activity.

Words Frequency Analysis

Daily Activity

Weekly Activity

User Engagement Distribution

Figure 10: Distribution of key engagement metrics across users, showing individual violin plots for each metric with median and interquartile range (IQR) statistics. Each plot uses a distinct color and includes summary statistics.

The violin plots provide a clearer view of the distribution of user engagement metrics:

  • Generated Questions & Answered Questions: Most users generate and answer only a small number of questions, as shown by the wide base near zero. A few users are highly active, producing a long tail of outliers with much higher counts.
  • Blocked Sites: The vast majority of users do not block any sites (distribution concentrated at zero), with only a handful blocking more than one site.
  • Levels Attempted: Most users attempt only one level, with very few exploring multiple levels. The distribution is sharply peaked at one, with a small tail for higher values.

Overall, the violin plots highlight that engagement is highly skewed: most users interact minimally, while a small subset are much more active or exploratory. This pattern is consistent across all four metrics.